Gold
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
Vemula, Saketh Reddy, Dandapat, Sandipan, Sharma, Dipti Misra, Krishnamurthy, Parameswari
The relationship between tokenizer algorithm (e.g., Byte-Pair Encoding (BPE), Unigram), morphological alignment, tokenization quality (e.g., compression efficiency), and downstream performance remains largely unclear, particularly for languages with complex morphology. In this paper, we conduct a comprehensive evaluation of tokenizers using small-sized BERT models -- from pre-training through fine-tuning -- for Telugu (agglutinative), along with preliminary evaluation in Hindi (primarily fusional with some agglutination) and English (fusional). To evaluate morphological alignment of tokenizers in Telugu, we create a dataset containing gold morpheme segmentations of 600 derivational and 7000 inflectional word forms. Our experiments reveal two key findings for Telugu. First, the choice of tokenizer algorithm is the most significant factor influencing performance, with Unigram-based tokenizers consistently outperforming BPE across most settings. Second, while better morphological alignment shows a moderate, positive correlation with performance on text classification and structure prediction tasks, its impact is secondary to the tokenizer algorithm. Notably, hybrid approaches that use morphological information for pre-segmentation significantly boost the performance of BPE, though not Unigram. Our results further showcase the need for comprehensive intrinsic evaluation metrics for tokenizers that could explain downstream performance trends consistently.
- Asia > Singapore (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > Sweden > Vaestra Goetaland > Gothenburg (0.04)
- (15 more...)
AI Agent Architecture for Decentralized Trading of Alternative Assets
Borjigin, Ailiya, He, Cong, Lee, Charles CC, Zhou, Wei
--Decentralized trading of real-world alternative assets (e.g., gold) requires bridging physical asset custody with blockchain systems while meeting strict requirements for compliance, liquidity, and risk management. We present a research-oriented architecture, GoldMine OS, that employs multiple specialized AI agents to automate and secure the tokenization and exchange of physical gold into a blockchain-based stablecoin ("OZ"). We detail the design of four cooperative agents (for Compliance, T oken Issuance, Market-Making, and Risk Control) and a coordinating core, and we evaluate the system through both simulation and a controlled pilot deployment. In experiments, the prototype achieves on-demand token issuance in under 1.2 s, a speed-up of over 100 compared to traditional manual workflows. The integrated Market-Making agent provides tight liquidity (spreads often <0.5%) even under volatile market conditions. Through fault injection tests, we demonstrate the system's resilience: an oracle price spoofing attack is detected and mitigated within 10 s, and a simulated vault mis-reporting triggers an immediate halt of issuances with minimal impact on users. Our results indicate that an AI-agent-based decentralized exchange for alternative assets can meet rigorous performance and safety requirements. We discuss the broader implications for democratizing access to traditionally illiquid assets and outline how our governance model (multi-signature agent updates and on-chain community voting on risk parameters) ensures ongoing transparency, adaptability, and formal assurance of system integrity. Tokenizing real-world assets (RW As) like precious metals on blockchains promises to democratize access to alternative investments, but it raises significant challenges in trust, compliance, and market stability [1] [2]. For instance, gold-backed cryptocurrencies such as P AX Gold (P AXG) and Tether Gold (XAUT) peg digital tokens to physical gold reserves, yet they rely heavily on centralized processes for custody and compliance [2]. Achieving a truly decentralized yet regulatorily compliant trading platform for assets like gold remains an open problem. Key hurdles include ensuring that on-chain token supply always mirrors off-chain reserves (requiring robust proof-of-reserve mechanisms), automating complex compliance checks (KYC/AML) in a user-friendly manner, providing continuous liquidity in thinly-traded assets, and guarding against failures of external data sources (the well-known oracle problem [3]). In this paper, we address these challenges by designing and evaluating GoldMine OS, an AI-driven multi-agent architecture for decentralized trading of gold-backed tokens.
- Information Technology (1.00)
- Banking & Finance > Trading (1.00)
- Materials > Metals & Mining > Gold (0.89)
Predicting Stock Market Crash with Bayesian Generalised Pareto Regression
This paper develops a Bayesian Generalised Pareto Regression (GPR) model to forecast extreme losses in Indian equity markets, with a focus on the Nifty 50 index. Extreme negative returns, though rare, can cause significant financial disruption, and accurate modelling of such events is essential for effective risk management. Traditional Generalised Pareto Distribution (GPD) models often ignore market conditions; in contrast, our framework links the scale parameter to covariates using a log-linear function, allowing tail risk to respond dynamically to market volatility. We examine four prior choices for Bayesian regularisation of regression coefficients: Cauchy, Lasso (Laplace), Ridge (Gaussian), and Zellner's g-prior. Simulation results suggest that the Cauchy prior delivers the best trade-off between predictive accuracy and model simplicity, achieving the lowest RMSE, AIC, and BIC values. Empirically, we apply the model to large negative returns (exceeding 5%) in the Nifty 50 index. Volatility measures from the Nifty 50, S&P 500, and gold are used as covariates to capture both domestic and global risk drivers. Our findings show that tail risk increases significantly with higher market volatility. In particular, both S&P 500 and gold volatilities contribute meaningfully to crash prediction, highlighting global spillover and flight-to-safety effects. The proposed GPR model offers a robust and interpretable approach for tail risk forecasting in emerging markets. It improves upon traditional EVT-based models by incorporating real-time financial indicators, making it useful for practitioners, policymakers, and financial regulators concerned with systemic risk and stress testing.
- North America > United States > North Carolina (0.04)
- North America > United States > New York (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (2 more...)
- Banking & Finance > Trading (1.00)
- Materials > Metals & Mining > Gold (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.69)
Drones, gold, and threats: Sudan's war raises regional tensions
On May 4, Sudan's paramilitary Rapid Support Forces (RSF) launched a barrage of suicide drones at Port Sudan, the army's de facto wartime capital on the Red Sea. The Sudanese Armed Forces (SAF) accused foreign actors of supporting the RSF's attacks and even threatened to sever ties with one of its biggest trading partners. The RSF surprised many with the strikes. It had used drones before, but never hit targets as far away as Port Sudan, which used to be a haven, until last week. "The strikes … led to a huge displacement from the city. Many people left Port Sudan," Aza Aera, a local relief worker, told Al Jazeera.
- Africa > Sudan > Red Sea State > Port Sudan (0.74)
- Asia > Middle East > UAE (0.32)
- Indian Ocean > Red Sea (0.25)
- (7 more...)
- Government > Military > Army (0.44)
- Materials > Metals & Mining > Gold (0.31)
Why 23andMe's Genetic Data Could Be a 'Gold Mine' for AI Companies
But any AI-related company attempting to acquire 23andMe would run significant reputational risks. Many people are horrified by the thought that they surrendered their genetic data to trace their ancestry, only for it to now be potentially used in ways they never consented to. "Anybody touching this data is running a risk," Kumar, who is the director of Fox's Center for Business Analytics and Disruptive Technologies, says. "But at the same time, not touching it, they might be losing on something big as well." What Does That Mean For Your Account?
- Health & Medicine > Pharmaceuticals & Biotechnology (0.86)
- Materials > Metals & Mining > Gold (0.40)
Low-Confidence Gold: Refining Low-Confidence Samples for Efficient Instruction Tuning
Cai, Hongyi, Li, Jie, Dong, Wenzhen
The effectiveness of instruction fine-tuning for Large Language Models is fundamentally constrained by the quality and efficiency of training datasets. This work introduces Low-Confidence Gold (LCG), a novel filtering framework that employs centroid-based clustering and confidence-guided selection for identifying valuable instruction pairs. Through a semi-supervised approach using a lightweight classifier trained on representative samples, LCG curates high-quality subsets while preserving data diversity. Experimental evaluation demonstrates that models fine-tuned on LCG-filtered subsets of 6K samples achieve superior performance compared to existing methods, with substantial improvements on MT-bench and consistent gains across comprehensive evaluation metrics. The framework's efficacy while maintaining model performance establishes a promising direction for efficient instruction tuning.
Can ChatGPT Overcome Behavioral Biases in the Financial Sector? Classify-and-Rethink: Multi-Step Zero-Shot Reasoning in the Gold Investment
Liu, Shuoling, Jia, Gaoguo, Jiang, Yuhang, Chen, Liyuan, Yang, Qiang
Large Language Models (LLMs) have achieved remarkable success recently, displaying exceptional capabilities in creating understandable and organized text. These LLMs have been utilized in diverse fields, such as clinical research, where domain-specific models like Med-Palm have achieved human-level performance. Recently, researchers have employed advanced prompt engineering to enhance the general reasoning ability of LLMs. Despite the remarkable success of zero-shot Chain-of-Thoughts (CoT) in solving general reasoning tasks, the potential of these methods still remains paid limited attention in the financial reasoning task.To address this issue, we explore multiple prompt strategies and incorporated semantic news information to improve LLMs' performance on financial reasoning tasks.To the best of our knowledge, we are the first to explore this important issue by applying ChatGPT to the gold investment.In this work, our aim is to investigate the financial reasoning capabilities of LLMs and their capacity to generate logical and persuasive investment opinions. We will use ChatGPT, one of the most powerful LLMs recently, and prompt engineering to achieve this goal. Our research will focus on understanding the ability of LLMs in sophisticated analysis and reasoning within the context of investment decision-making. Our study finds that ChatGPT with CoT prompt can provide more explainable predictions and overcome behavioral biases, which is crucial in finance-related tasks and can achieve higher investment returns.
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (4 more...)
- Materials > Metals & Mining > Gold (1.00)
- Banking & Finance > Trading (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
An Adaptive Framework for Generating Systematic Explanatory Answer in Online Q&A Platforms
Chen, Ziyang, Wang, Xiaobin, Jiang, Yong, Liao, Jinzhi, Xie, Pengjun, Huang, Fei, Zhao, Xiang
Question Answering (QA) systems face challenges in handling complex questions that require multi-domain knowledge synthesis. The naive RAG models, although effective in information retrieval, struggle with complex questions that require comprehensive and in-depth answers. The pioneering task is defined as explanatory answer generation, which entails handling identified challenges such as the requirement for comprehensive information and logical coherence within the generated context. To address these issues, we refer to systematic thinking theory and propose SynthRAG, an innovative framework designed to enhance QA performance. SynthRAG improves on conventional models by employing adaptive outlines for dynamic content structuring, generating systematic information to ensure detailed coverage, and producing customized answers tailored to specific user inquiries. This structured approach guarantees logical coherence and thorough integration of information, yielding responses that are both insightful and methodically organized. Empirical evaluations underscore SynthRAG's effectiveness, demonstrating its superiority in handling complex questions, overcoming the limitations of naive RAG models, and significantly improving answer quality and depth. Furthermore, an online deployment on the Zhihu platform revealed that SynthRAG's answers achieved notable user engagement, with each response averaging 5.73 upvotes and surpassing the performance of 79.8% of human contributors, highlighting the practical relevance and impact of the proposed framework. Our code is available at https://github.com/czy1999/SynthRAG .
- Europe > Austria > Vienna (0.14)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > Middle East > Jordan (0.04)
- (11 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
- Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.90)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
ASGM-KG: Unveiling Alluvial Gold Mining Through Knowledge Graphs
Gupta, Debashis, Golder, Aditi, Fernendez, Luis, Silman, Miles, Lersen, Greg, Yang, Fan, Plemmons, Bob, Alqahtani, Sarra, Pauca, Paul Victor
Artisanal and Small-Scale Gold Mining (ASGM) is a low-cost yet highly destructive mining practice, leading to environmental disasters across the world's tropical watersheds. The topic of ASGM spans multiple domains of research and information, including natural and social systems, and knowledge is often atomized across a diversity of media and documents. We therefore introduce a knowledge graph (ASGM-KG) that consolidates and provides crucial information about ASGM practices and their environmental effects. The current version of ASGM-KG consists of 1,899 triples extracted using a large language model (LLM) from documents and reports published by both non-governmental and governmental organizations. These documents were carefully selected by a group of tropical ecologists with expertise in ASGM. This knowledge graph was validated using two methods. First, a small team of ASGM experts reviewed and labeled triples as factual or non-factual. Second, we devised and applied an automated factual reduction framework that relies on a search engine and an LLM for labeling triples. Our framework performs as well as five baselines on a publicly available knowledge graph and achieves over 90 accuracy on our ASGM-KG validated by domain experts. ASGM-KG demonstrates an advancement in knowledge aggregation and representation for complex, interdisciplinary environmental crises such as ASGM.
- South America > Brazil (0.14)
- South America > Venezuela (0.04)
- South America > Suriname (0.04)
- (17 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Semantic Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
The Embodied World Model Based on LLM with Visual Information and Prediction-Oriented Prompts
Haijima, Wakana, Nakakubo, Kou, Suzuki, Masahiro, Matsuo, Yutaka
In recent years, as machine learning, particularly for vision and language understanding, has been improved, research in embedded AI has also evolved. VOYAGER is a well-known LLM-based embodied AI that enables autonomous exploration in the Minecraft world, but it has issues such as underutilization of visual data and insufficient functionality as a world model. In this research, the possibility of utilizing visual data and the function of LLM as a world model were investigated with the aim of improving the performance of embodied AI. The experimental results revealed that LLM can extract necessary information from visual data, and the utilization of the information improves its performance as a world model. It was also suggested that devised prompts could bring out the LLM's function as a world model.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Fukuoka Prefecture > Fukuoka (0.04)
- Leisure & Entertainment > Games > Computer Games (0.36)
- Materials > Metals & Mining > Gold (0.31)